A Large-scale Test Set for Author Disambiguation
نویسندگان
چکیده
منابع مشابه
Citation-based bootstrapping for large-scale author disambiguation
We present a new, two-stage, self-supervised algorithm for author disambiguation in large bibliographic databases. In the first “bootstrap” stage, a collection of highprecision features is used to bootstrap a training set with positive and negative examples of coreferring authors. A supervised feature-based classifier is then trained on the bootstrap clusters and used to cluster the authors in ...
متن کاملAuthormagic: A Concept for Author Disambiguation in Large-Scale Digital Libraries
Author name ambiguities distort the quality of information discovery in digital libraries. These ambiguities also contribute to the inaccurate attribution of authorship to individual researchers. The latter is especially delicate in research evaluation. To solve this issue, many algorithmic bulk disambiguation approaches have been proposed in the literature. However, no algorithmic approach can...
متن کاملEfficient Name Disambiguation for Large-Scale Databases
Name disambiguation can occur when one is seeking a list of publications of an author who has used different name variations and when there are multiple other authors with the same name. We present an efficient integrative framework for solving the name disambiguation problem: a blocking method retrieves candidate classes of authors with similar names and a clustering method, DBSCAN, clusters p...
متن کاملA Large-Scale Multilingual Disambiguation of Glosses
Linking concepts and named entities to knowledge bases has become a crucial Natural Language Understanding task. In this respect, recent works have shown the key advantage of exploiting textual definitions in various Natural Language Processing applications. However, to date there are no reliable large-scale corpora of sense-annotated textual definitions available to the research community. In ...
متن کاملAuthor Name Disambiguation for PubMed
Log analysis shows that PubMed users frequently use author names in queries for retrieving scientific literature. However, author name ambiguity may lead to irrelevant retrieval results. To improve the PubMed user experience with author name queries, we designed an author name disambiguation system consisting of similarity estimation and agglomerative clustering. A machine-learning method was e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Journal of the Korea Contents Association
سال: 2009
ISSN: 1598-4877
DOI: 10.5392/jkca.2009.9.11.455